Improvements in Determining Protein Subcellular Location
نویسندگان
چکیده
Knowing where a protein occurs in the cell is an important step towards understanding its function [1]. Hence, a method for accurately predicting the subcellular location would be valuable in interpreting the data being provided by sequencing projects. Among some methods (e.g. search for signal peptides, infer the location by sequence homology) the correlation between the total aminoacid composition of proteins and its subcellular location is the most studied one. In this context, this paper introduces a new physical-chemical attribute relevant to the process of determining the protein subcellular location when utilizing the yeast database. This was achieved by developing a series of tests involving multiples Artificial Neural Networks (ANN) and the subsequent use of Linear Discriminant Analysis (LDA) as a way to explain the results reported by the ANNs. Two improvements were obtained: first, better classification scores than those previously produced by other works; second, a better choice of attributes resulting in a further improvement. The yeast database [4] was used by two previous works [2][3] where both tried to predict the protein subcellular location involving different techniques. There is some doubt concerning to the methodology used during the test of the ANNs developed by Cairns [2]. The approach developed by this paper utilized an adequated training process involving n-fold-cross-validation where “n” is suitable to the data available. Instead of using one network, multiple networks were developed. By doing this, the impact of the interferences inter-classes during the learning process was minimized. Additionally a series of tests was done to explore the space of possible network architectures. A misclassification was observed between some classes. For example, the sites Cytoplasm and Nuclear interfered resulting in a lot of classification errors. At this point LDA was used in an attempt to explain these errors. The results obtained by LDA reached the identification of one completely useless attribute in the database and the characterization of conflicts involving the values of the attributes that were supposed to identify the location. Identified the necessity of a new physical-chemical attribute to improve the results we tested 16 new ones obtained from the Yale University database in a subset of 80 randonly choosed proteins containing 40 correct cases (20 cytoplasm and 20 nuclear) and 40 incorrect (20 cytoplasm that were reported as nuclear and vice-versa). The results using the protein ́s “Isoeletric Point” reported an improvement from 60% to 72.5% in the classification score. As future works we intend to remake the tests involving the ANNs by replacing the useless attribute by the Isoeletric Point.
منابع مشابه
Location proteomics: systematic determination of protein subcellular location.
Proteomics seeks the systematic and comprehensive understanding of all aspects of proteins, and location proteomics is the relatively new subfield of proteomics concerned with the location of proteins within cells. This review provides a guide to the widening selection of methods for studying location proteomics and integrating the results into systems biology. Automated and objective methods f...
متن کاملCombining experimental and predicted datasets for determination of the subcellular location of proteins in Arabidopsis.
Substantial experimental datasets defining the subcellular location of Arabidopsis (Arabidopsis thaliana) proteins have been reported in the literature in the form of organelle proteomes built from mass spectrometry data (approximately 2,500 proteins). Subcellular location for specific proteins has also been published based on imaging of chimeric fluorescent fusion proteins in intact cells (app...
متن کاملRobust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images
The ongoing biotechnology revolution promises a complete understanding of the mechanisms by which cells and tissues carry out their functions. Central to that goal is the determination of the function of each protein that is present in a given cell type, and determining a protein’s location within cells is critical to understanding its function. As large amounts of data become available from ge...
متن کاملRobust classification of subcellular location patterns in fluorescence microscope images
The ongoing biotechnology revolution promises a complete understanding of the mechanisms by which cells and tissues carry out their functions. Central to that goal is the determination of the function of each protein that is present in a given cell type, and determining a protein's location within cells is critical to understanding its function. As large amounts of data become available from ge...
متن کاملTowards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images
Determination of the functions of all expressed proteins represents one of the major upcoming challenges in computational molecular biology. Since subcellular location plays a crucial role in protein function, the availability of systems that can predict location from sequence or high-throughput systems that determine location experimentally will be essential to the full characterization of exp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003